ModEx and Seed-Detective: Two novel techniques for high quality clustering by using good initial seeds in K-Means

نویسندگان

  • Md Anisur Rahman
  • Md Zahidul Islam
  • Terry Bossomaier
چکیده

Clustering; Classification; K-Means; Cluster evaluation; Data mining Abstract In this paper we present two clustering techniques called ModEx and Seed-Detective. ModEx is a modified version of an existing clustering technique called Ex-Detective. It addresses some limitations of Ex-Detective. Seed-Detective is a combination of ModEx and Simple KMeans. Seed-Detective uses ModEx to produce a set of high quality initial seeds that are then given as input to K-Means for producing the final clusters. The high quality initial seeds are expected to produce high quality clusters through K-Means. The performances of Seed-Detective and ModEx are compared with the performances of Ex-Detective, PAM, Simple K-Means (SK), Basic Farthest Point Heuristic (BFPH) and New Farthest Point Heuristic (NFPH). We use three cluster evaluation criteria namely F-measure, Entropy and Purity and four natural datasets that we obtain from the UCI Machine learning repository. In the datasets our proposed techniques perform better than the existing techniques in terms of F-measure, Entropy and Purity. The sign test results suggest a statistical significance of the superiority of Seed-Detective (and ModEx) over the existing techniques. a 2015 The Authors. Production and hosting by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CCBY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Seed-Detective: A Novel Clustering Technique Using High Quality Seed for K-Means on Categorical and Numerical Attributes

In this paper we present a novel clustering technique called Seed-Detective. It is a combination of modified versions of two existing techniques namely Ex-Detective and Simple K-Means. Seed-Detective first discovers a set of preliminary clusters using our modified Ex-Detective. The modified Ex-Detective allows a data miner to assign different weights (importance levels) for all attributes, both...

متن کامل

Robust partitional clustering by outlier and density insensitive seeding

The leading partitional clustering technique, k-means, is one of the most computationally efficient clustering methods. However, it produces a local optimal solution that strongly depends on its initial seeds. Bad initial seeds can also cause the splitting or merging of natural clusters even if the clusters are well separated. In this paper, we propose, ROBIN, a novel method for initial seed se...

متن کامل

Automatically Finding Good Clusters with Seed K-Means

In finding biologically relevant groups of genes with gene expression data obtained by microarray technologies, the k-means clustering method is one of the most popular approaches due to its easiness to use and simplicity to implement. However, the randomness of k-means clustering method in choosing initial points to start with makes it impossible to obtain reliable results without much iterati...

متن کامل

A hybrid clustering technique combining a novel genetic algorithm with K-Means

Many existing clustering techniques including K-Means require a user input on the number of clusters. It is often extremely difficult for a user to accurately estimate the number of clusters in a data set. The genetic algorithms (GAs) generally determine the number of clusters automatically. However, they typically choose the genes and the number of genes randomly. If we can identify the right ...

متن کامل

DenClust: A Density Based Seed Selection Approach for K-Means

In this paper we present a clustering technique called DenClust that produces high quality initial seeds through a deterministic process without requiring an user input on the number of clusters k and the radius of the clusters r. The high quality seeds are given input to K-Means as the set of initial seeds to produce the final clusters. DenClust uses a density based approach for initial seed s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015